Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Different results when using the operator "|" or ssigning the same variables a dummy

    Dear members,
    I apologise in advance for the stupidity of my question but can't understnd where the problem lies. I have a dataset of meaurements of one parameter in subjects; eachs ubject undergoes measurements while supine with no device then measurements using one specific device (these are alternate: supine/one device, then supine/second device and so on). Only three devices. I created also a dummy variable "sup_preox" where value 1 is given to the supine sessions
    In a reduced version ths lookslike this
    id session device sup_preox EIT_POST
    124 Baseline Supine Ambu Peep 1 43
    124 At 3 min Ambu Peep 0 53
    124 Baseline Supine 2 Face Mask 1 49
    124 At 3 min sess 2 Face Mask 0 51
    124 Baseline Supine sess 3 Ambu 1 53
    124 At 3 min sess 3 Ambu 0 58
    127 Baseline Supine Face Mask 1 27
    127 At 3 min Face Mask 0 27
    127 Baseline Supine 2 Ambu Peep 1 37
    127 At 3 min sess 2 Ambu Peep 0 35
    127 Baseline Supine sess 3 Ambu 1 31
    127 At 3 min sess 3 Ambu 0 31

    Now, I want to compare the EIT-POST value at supine with the one with each device but if I ask stata to describe the median value of the three supine sessions with the operator "|" I get a different asnwer than when asking STATA to give me this result with the dummy variable
    Code:
    1. sum EIT_POST if session==1|session==5|session==9, d
    2. sum EIT_POST if sup_preox==1, d

    In fact these commands should be describing the same data so I don't get why I get different results.
    Thanks in advance for the help
    Ana (Stata v. 18 BE)

  • #2
    The data display you have provided is extremely difficult to work with. I think I have figured out what your data set looks like, but there was some (educated) guesswork involved. I think it's this:
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input int id str24 session str11 device byte(sup_preox eit_post)
    124 "Baseline Supine"        "Ambu Peep" 1 43
    124 "At 3 min"               "Ambu Peep" 0 53
    124 "Baseline Supine 2"      "Face Mask" 1 49
    124 "At 3 min sess 2"        "Face Mask" 0 51
    124 "Baseline Supine sess 3" "Ambu"      1 53
    124 "At 3 min sess 3"        "Ambu"      0 58
    127 "Baseline Supine"        "Face Mask" 1 27
    127 "At 3 min"               "Face Mask" 0 27
    127 "Baseline Supine 2"      "Ambu Peep" 1 37
    127 "At 3 min sess 2"        "Ambu Peep" 0 35
    127 "Baseline Supine sess 3" "Ambu"      1 31
    127 "At 3 min sess 3"        "Ambu"      0 31
    end
    Run the above and verify that what I have there is correct, as otherwise further discussion is likely to be fruitless or worse.

    Assuming that's correct, we immediately encounter a problem: your code includes clauses like -session == 1|session == 5|session==9-. But those are impossible conditions with this data set because session is a string variable and cannot take on values like 1, 5, or 9. Perhaps in your real data set, what you have is a numeric session variable that has a value label that assigns labels like "Baseline Supine," etc. to numeric values that include 1, 5, or 9. But there is absolutely nothing in what you show that indicates what the mapping from numeric values to labels might be. What I can show you is this:
    Code:
                          |       sup_preox
                  session |         0          1 |     Total
    ----------------------+----------------------+----------
        "At 3 min sess 2" |         2          0 |         2
        "At 3 min sess 3" |         2          0 |         2
               "At 3 min" |         2          0 |         2
      "Baseline Supine 2" |         0          2 |         2
    "Baseline Supine se.. |         0          2 |         2
        "Baseline Supine" |         0          2 |         2
    ----------------------+----------------------+----------
                    Total |         6          6 |        12
    This tells me that sup_preox == 1 corresponds exactly to all sessions that contain "Baseline Supine" in thier label, and sup_preox == 0 corresponds exactly to all sessions that contain "At 3 min" in their label. Moreover, if I -summarize- eit_post in the corresponding ways, everything checks out:
    Code:
    . summ eit_post if sup_preox == 0
    
        Variable |        Obs        Mean    Std. dev.       Min        Max
    -------------+---------------------------------------------------------
        eit_post |          6        42.5     13.0499         27         58
    
    . summ eit_post if substr(session, 1, 8) == "At 3 min"
    
        Variable |        Obs        Mean    Std. dev.       Min        Max
    -------------+---------------------------------------------------------
        eit_post |          6        42.5     13.0499         27         58
    
    .
    . summ eit_post if sup_preox == 1
    
        Variable |        Obs        Mean    Std. dev.       Min        Max
    -------------+---------------------------------------------------------
        eit_post |          6          40    10.17841         27         53
    
    . summ eit_post if substr(session, 1, 8) == "Baseline"
    
        Variable |        Obs        Mean    Std. dev.       Min        Max
    -------------+---------------------------------------------------------
        eit_post |          6          40    10.17841         27         53
    So, in other words, I can't replicate your problem. I suspect the difficulty you have is that you believe that the numeric session codes for sup_preox == 1 correspond to 1, 5, and 9, but that is not actually the case. But as the example data you show provides no direct information about the labeling of session, I can't say anything more definitive than that.

    To avoid this kind of problem, in the future, when showing example data, always use the -dataex- command, as I have done here. Had you done that, the labeling of session would have been shown and we could reach a more definite conclusion.

    If you are running version 18, 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    Comment


    • #3
      Thank you very very much for your help! Yes, first of all, "session" is a numeric variable with labels and I remembered there was a correct way to show my data here but I could not remember which one; now that you mention it, I will go through "dataex" for any new problems I should encounter. In the meantime, I think I understood where the problem was: I also have 3 different patients groups and so my command was for example
      sum eit_post if group ==1 & session ==5|session==1|session==9
      I read that the operator & has precedence on the operator | and that I should have typed more correctly
      sum eit_post if group ==1 & session ==5| group ==1 &session==1| group ==1 & session==9

      In this way, I got the same results I had when substituting all three sessions with a dummy variable

      Thank you very much for your kindness
      Anna

      Comment


      • #4
        Code:
        sum eit_post if group ==1 & inlist(session, 1, 5, 9)

        See e.g. https://journals.sagepub.com/doi/pdf...867X1101100308

        Comment


        • #5
          Thank you!!

          Comment

          Working...
          X